Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient Rolling AUC-PR implementation #1543

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

davidlpgomes
Copy link
Contributor

@davidlpgomes davidlpgomes commented May 10, 2024

A C++ implementation of the Prequential/Rolling AUC-PR, it uses Cython to compile the code.

It uses a sliding window of size S, calculating the precise (i.e., not an approximation) AUC-PR with the last S seen instances.

Based on Gomes, Grégio, Alves, and Almeida, 2023.

@AdilZouitine
Copy link
Member

Hey, great contribution! 😄 Could you provide some benchmarks to illustrate how much the rolling AUC calculation has sped up?

@davidlpgomes
Copy link
Contributor Author

Hey @AdilZouitine, thanks! My team and I are the writers of the paper mentioned.

In the paper, we ran several experiments with various stream datasets, comparing our prequential algorithm with the batch version (in addition to scikit-learn's batch implementation). On average, our algorithm proved to be 13 times faster, using 12 times less energy, compared to the batch algorithm (using a window of size 1000).

I will implement a simple stream experiment comparing the time spent to calculate the AUC-PR using our prequential algorithm and the batch version. I'll send the link to the repository when I'm done 😃

@davidlpgomes
Copy link
Contributor Author

davidlpgomes commented May 16, 2024

Hey, @AdilZouitine, the benchmarks (code and results) comparing the Rolling AUC-PR and the Batch AUC-PR are presented on my benchmark-aucpr repository.

The Rolling algorithm is the same as the contribution, with some unused functions removed.
The Batch AUC-PR function has a similar algorithm, but does not store a window of samples, instead, receives the scores and y_true as parameters.

In the benchmarks, they are used directly in C++, i.e., without Cython/Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants